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Abstract. This paper presents a formulation of a 
novel methodology for evaluation of testing in 
support of operational reliability assessment and 
prediction. The methodology features an incre- 
mental evaluation of the representativeness of a 
set of development and validation test cases 
together with definition of additional test cases 
to enhance those qualities. 

If test cases are derived 1n typical fashion 
(I.e., to find and remove bugs, to Investigate 
software performance under off -nominal conditions, 
to exercise structural elements and functional 
capabilities of the software, and to demonstrate 
satisfaction of software requirements), then the 
complete set of test cases Is not necessarily 
representative of anticipated operational usage. 
The paper reports on Initial research Into formu- 
lation of valid measures of testing 
representat 1 veness . 

Several techniques which permit specification of 
expected operational usage are described, and a 
technique for evaluating the correlation between 
actual testing accomplished and expected opera- 
tional usage 1s defined. An unbiased estimator for 
operational usage reliability is proposed and jus- 
tified as a function of a specified operational 
profile; confidence in the estimate is derived 
from a measure of the degree to which testing is 
representative of expected operational application. 

An experimental application of the techniques to 
a small program is provided as an illustration of 
the proposed use of the methodology for operational 
software reliability estimation. The relationship 
between structural exercise testing thoroughness 
and operational usage representativeness is dis- 
cussed; the specification of a quantified relia- 
bility requirement and an explicit, required 
representativeness measure (or confidence) is 
Identified as integral to effective application of 
the proposed reliability testing methodology; 
efforts to extend, formalize and generalize the 



methodology are described; and expected benefits, 
as well as potential problems and limitations are 
identified. 



The software reliability problem 

The field of software quality assurance has 
suffered from confusion, owing to the lack of an 
acceptable definition of reliability and lack of 
means for relating quantitative measures of relia- 
bility to values that reflect actual experience 
with software failures. Because of the complexity 
of software, no adequate model of software relia- 
bility, neither conceptual nor mathematical, has 
been developed. A number of investigators who 
have studied software reliability have attempted 
to develop software models based on formulas and 
concepts borrowed from hardware reliability. 
These attempts have been unsuccessful, largely 
because the relationships between the models and 
actual software properties were not adequately 
established. 



Despite the large amount of effort devoted to 
test and validation, undetected software errors 
continue to be a major concern to both designers 
and users. With the development of real-time 
software systems to control vital and critical 
processes, undetected errors can produce system 
failure with catastrophic results. Nevertheless, 
the goal of achieving 100 percent reliability 1n 
software by exhaustive testing 1s, 1n most cases, 
prohibited by cost and schedule and 1s, for the 
most part, unrealizable. 



It is well known that virtually all operational 
programs still contain errors. In one sense then 
(since they are doomed to fail sooner or later), 
the reliability df such programs 1s zero (0). It 
1s, however, equally well known that many computer 
programs operate day after day, and have done so 
for years without any errors appearing. In a 
different sense (i.e., based on the empirical 
evidence), one liiight be inclined to say that the 
reliability of such programs 1s one (1). This 
apparent paradox arises because insufficient 
attention has b>een given to the distinction between 
reliability in operation (or, our confidence that 
the program will run correctly) and reliability 
in theory (or, our knowledge that most programs 
are not completely error- free). 
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The main goal of this paper 1s to Introduce a 
basic methodology for the evaluation of alternative 
software testing strategies which permits determi- 
nation of the relevance of the strategies to quan- 
titative estimation of software reliability. It 
1s noteworthy that to date the vast majority of 
software reliability study and methodology develop- 
ment has been based upon (and thus depends upon) 
software testing. And yet, as formulas have been 
generated, data collected, and as testing-based 
reliability methodologies have subsequently 
emerged, the testing activity Itself has escaped 
rigorous attention. Consider, for Instance, the 
following conventional testing goals: 

1) to find and remove "bugs" 

2) to investigate software performance under 
off-nominal conditions 

3) to exercise structural elements and 
functional capabilities of the software 

4) to demonstrate satisfaction of stated 
software requirements. 

Upon close examination we might more properly view 
(1) and £4) as desirable end results (i.e., goals), 
while (2) and (3) describe activities or testing 
approaches thought to be relevant to the achieve- 
ment of those goals. There are empirical results, 
at least, which serve to support an intuitive 
feeling that the "means" and the "ends" are indeed 
related. Unfortunately, precise definition of the 
nature of the relationship has not been addressed, 
and, consequently, the meaning and value of testing 
has eluded the grasp of software reliability 
researchers. Furthermore, there is not yet a gen- 
erally accepted way to begin with a statement about 
the relative " error- freeness" of code and produce a 
defensible statement about the probability that 
failures may (or may not) occur during operational 
use of the software. 

If we are to achieve any real success with 
testing-based reliability methodologies, it is 
mandatory that we Identify and learn to cope with 
significant sources of uncertainty due to Insuffi- 
cient knowledge of the "worth" of the testing 
accomplished. The remainder of \th1s paper pre- 
sents a detailed formulation and\ Illustrated appli- 
cation of a particular testing-based software 
reliability methodology. The methodology deliber- 
ately exposes sources of uncertainty and uses a 
measure of the uncertainty in estimation of oper- 
ational software reliability as well as determin- 
ation of a quantified level of confidence in the 
reliability estimate. A typical application of 
the methodology to a particular computer program 
Involves: 

1) definition of an operational profile 
which characterizes expected program usage 
through an appropriate partition (i.e., 
subdivision) of the total "space" of 
Inputs upon which the program must operate 

2) specification of an operational profile 
probability distribution which characterizes 
and quantifies the relative frequency with 
which operational usage will expose the 
program to each of the subdivisions of the 
input space 



3) for a set of test cases developed 1n 
accordance with a conventional testing 
strategy, a measure of the degree to which 
the set of tests 1s representative of 
expected operational program usage 1s 
computed 

4) operational usage reliability is estimated 
as a function of actual- testing experience 
factored by the operational profile proba- 
bility distribution, and confidence 1n the 
reliability estimate 1s derived from the 
measure of testing representativeness, and 

5) finally, the initial set of tests 1s aug- 
mented with additional test cases and 3 and 
4 are repeated as necessary to achieve an 
improved reliability estimate and level of 
confidence. 

A technical approach to reliability testing 

This section presents nomenclature, definitions, 
assumptions and detailed formulation of fundamental 
elements of the above, briefly described repre- 
sentative testing and reliability measurement 
methodology. A few of the underlying concepts 
are not new and have been studied and documented 
in detail [1, 2] as part of a continuing TRW soft- 
ware reliability research program and related con- 
tractual efforts. These concepts (e.g., the Input 
Data Space) are treated here only to the extent 
necessary to provide a complete description of an 
integrated software reliability methodology. 

The input data space. In a formal sense, a computer 
program may be defined as a specification of a 
computable function on the input expressions of 
the program. All input expressions can be repre- 
sented in the form of an array of variable names 
(X], X2 .... X n ), whose values need to be speci- 
fied 1n order to cause execution of the program. 
The Input Data Space, E, Is therefore defined as 

the set of = (Xj, Xz X^, i = 1, 2. .... 

N, where each Xj can range over a finite set of 
values. In the general case, the number of 
possible values for which each X,- is defined can 
be very large. Consequently, the number, N, of 
"points" in the Input Data Space, being the pro- 
duct of the numbers of possible values for each Xj, 
could be enormous indeed. 

The operational profile and its probability distribution. 
If we consider that any given operational use of the 
program is equivalent to selection of exactly one of 
the Ei, then the Input Data Space, as a whole, is 
properly viewed as a specification of an operational 
profile which characterizes (1n fact, explicitly 
defines) expected operational usage. Furthermore, 
it is at least intuitively obvious that during 
operational use the program 1s more likely to be 
exposed to certain E{ than to others. In other 
words, 1f we Set out to assign a "probability of 
selection" to each Ej, we would create an opera- 
tional profile probability distribution. P(E-(), 
which would rarely, 1f ever, be uniform. 

Unfortunately, as Indicated above, the number 
of ti can be extremely large, and the effect 
required to create the corresponding probability 
distribution would be far in excess of available 
resources. The Immediate consequence of this fact 
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1s painfully clear. We seek a meaningful way to 
specify expected operational program usage* but 
find that the most obvious approach leads to more 
trouble than we can handle. Fortunately, there 1s 
a way of substantially reducing the difficulty 
through corresponding reduction in the number of 
"points" to which a probability of selection must 
be assigned. The reduction 1s effectively 
accomplished by viewing the Input Data Space as 
a collection of subspaces. that 1s as disjoint 
subsets, S^, containing a perhaps large number of 
points, t\ t sharing a common characteristic. We 
could, for example, define S] as the set of all 

Ei = (Xi, X? Xj X n )i such that the 

value of Xj is less tnan some specified value X, 
and define S? as the set of points for which 
Xi > X. To do so clearly identifies a partition 
of the Input Data Space. E, for which E * Si U Sg 
and Si and S2 are disjoint (I.e., <* $3 e 0). If 
we are careful In our choice of Xj and specify a 
value X which has significant meaning with respect 
to expected operational usage, then 1t is possible 
to characterize expected usage as (for example) : 



P($j) = P(Xj <X) = .8 and P{S 2 ) * P(Xj > X) * .2 

This approach to characterizing expected opera- 
tional usage involves the Identification of 
selected intervals 1n the range of one or more of 
the Input variables, X;. The resulting partition 
of E into disjoint subsets, S^, is thus defined 
explicitly In terms of the combinations of the 
disjoint intervals to which Input values must 
belong. 

There Is another approach which will produce an 
equally useful but. In general, entirely different 
partition of E. To permit a ready distinction 
between types of partitions 1n later discussion, 
we denote this latter type as the 6-partit1on. 
Subsequent reference to the Z-part1t1on is meant 
to Imply that the remarks apply to both the S- 
partition and G-part1tion described here. The 
primary principle Involved 1n definition of the G- 
partition is the Identification of functional 
characteristics of program usage in. a way that 
allows functional groupings, G^, of Ei to be speci- 
fied. One approach to accomplishing this task 
involves examination of the program and identifi- 
cation of disjoint sets of structural elements 
(e-9-. logic paths). In general, if we can indeed 
identify all of a program's structural elements 
and further attribute to each a statement of spe- 
cific function performed, then we may rather 
naturally collect the structural elements into 
sets of elements of similar or related function. 
Alternatively, and preferably, we may begin with 
specified statements of functions to be performed 
by the program, and systematically define the con- 
tents of the corresponding sets by determining to 
which statement of function each structural 
element belongs. 

The following discussion of logic paths is pro- 
vided to Illustrate thedetailed attention which 
must be given to definition of program structural 
elements as necessary to support subsequent defi- 
nition of the G-partition (i.e., subsets, G^, of 
E) and corresponding operational profile proba- 
bility distribution, P{G k ). 



A logic path of a program can be defined as a 
sequence of adjacent segments, beginning at an 
entry segjnent and proceeding by logical transfers 
to an exit segment. A segment 1s defined as 
follows: 

1) It Is a sequence of contiguous executable 
statements for which all statements In the 
segment will be executed 1f and only If the 
first statement Is executed. 

2) It begins with a statement to which control 
can be transferred and ends with a state- 
ment which transfers control to an 
adjacent segment. 

An entry segment has no predecessor segments 1n 
the program and an exit segment results 1n termi- 
nation or return of control to a calling program. 

A logic path may be defined also in terms of 
the sequence of transfers of control that take 
place. Thus, the occurrence of the first transfer 
implies that an entry segment 1s executed followed 
by execution of an adjacent segment. 

In general, even small programs could have a 
very large number of logic paths owing to the 
existence of loops [3]. The "large number of logic 
paths" problem has received a good deal of study in 
recent years, however, and mathematically rigor- 
ous, graph-theoretic techniques have been developed 
as necessary to reduce the problem to manageable 
proportions [4]. 



Let's assume that a particular program contains 
M logic paths of the type described above. We may 
label the paths, L m , and view the total collection 

[I.e., L], L 2 Lu) as a "logic path space" 

(L) which completely defines the program. Now, 
much as we did earlier with the input space, E, we 
are prompted to find relationships and conditions 
which support subdivision of L into a smaller num- 
ber of disjoint sets. Jo do so involves identifi- 
cation of subspaces, l£, containing a perhaps 
large number of logic paths, Lm, which share a 
common characteristic. We could, for example, 
define Lj as the set of all which contain a par- 
ticular transfer (from the above definition for 
logic. path), and define l 2 as the set of all 
which do not contain the particular transfer. We 
thus define a two-way partition of the, logic path 
space such that L, U L. « L and Li fl L2 = 0. 
There are clearly more elaborate criteria (e.g., 
all U, containing two particular transfers, all L*, 
containing one but not both, and all 1^ containing 
neither) which will result in a larger number of 
disjoint sets of L^. In general, we need to estab- 
lish the criteria in such a way that a unique 
function or event can be subsequently attributed 
to each of the subsets. 

The next ( step is to derive a probability distri- 
bution, P<L k ) » which estimates the relative fre- 
quency of occurrence of the function or event 
associated with each of the L^. In particular, 
from a knowledge of expected operational usage, one 
must determine the probability that an established 
set of criteria will be satisfied and, conse- 
quently, a program logic path from Lfc will be 
executed. 
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Finally, it is important to note that a given E^ 
will cause exactly one logic path, L^,, to be execu- 
ted. On the other hand, there may be many E) which 
cause the same logic path to be executed. Thus, we 
may define a G-partition of E, where each G^ con- 
tains all which cause execution of one of the 
paths in L^. It follows fcom the above that the Gfc 
are disjoint and collectively contain all of E. 
Moreover, note that the desired operational profile 
probability distribution, P(Gk), is identical to the 
previously established PfL^). 

In summary, there are at least two (and perhaps 
more) general approaches which permit a partition 
of the Input Data Space into a small number of 

disjoint subsets, 2j, j = 1, 2 K. Both 

approaches reflect prior knowledge about expected 
operational usage and, therefore, directly support 
specification of the operational profile proba- 
bility distribution, P(Zj), j - 1 K. 

Subsequently, the probability distribution can be 
used (as shown below) to obtain a meaningful esti- 
mate of a program's operational usage reliability 
as well as a measure of confidence in the estimate. 

Reliability. Each input data subset, Zj, may be j 
further partitioned into two disjoint subsets Zj 
and Zj such that 



Zj is defined as the set of points in Zj corres- 
ponding to correct execution of the program, and 
Zj contains the rest of the points in Zj (i.e., 
those from which failures occur). Since the two ti 
subsets are disjoint, It follows that P{Zj) + P(2j )= 
P ( 2 j ) and we may define the theoretical reliability, 
R, of a program as follows: 



R = l P(Z-) = 1 - £ P(Z") 
j J j J 



Unfortunately, short of exhaustive testing, it is 
impossible to precisely determine the contents i( of 
Zj and Zj or the exact, values of P(Zj) and P(Zj). 
Ac best, we are able to estimate the contents and 
assign probabilities based upon actual obser- 
vations. For example, if a program is executed a 
total of n times and if fi failures are observed 
out of nj runs using points from Zj , then an esti- 
mate of program operational usage reliability is 
obtained from: 



r * 1 - j: it . P(Zj) (1) 



The estimator, R, is seen to be unbiased since the 
expected value of fj is nj P(Zj)/P(Zj) and.^by 
direct substitution, the expected value of R is: 



E(R) » 1 - t P(l'\) ■ R 
j J 



It is interesting to note that Equation (1) reduces 
to a more easily recognized and very popular esti- 
mator if we do nothing more than assume that the 
test cases (i.e., the n executions of the program) 
are identically proportional to the operational 
profile probability distribution. That is, if we 
assume 



nj = n • P(Zj) 



then 



i . , - , !i . P (z.) . i . , jp-Jj, • p (Zj ) 

J J J J 



so that 



R - 1 - I j f H (2) 

" j J 



There is, however, very good reason to question 
the validity of the above assumption in the gen- 
eral case. In fact, some very popular testing 
strategies in use today regularly result in a set 
of test cases that is vaguely, if at all, repre- 
sentative of expected operational usage. 

Reliability estimates derived from (1) and (2) 
are referred to as "the operational usage relia- 
bility" and "the observed (or assessed) reliabil- 
ity" and denoted as R\ and 83, respectively, in 
the following discussion. It is Important to 
remember the fundamental difference between the 
two: that is, 6] incorporates information about 
expected operational program usage in order to 
give proper "weight" [i.e., P(Zj)] to categorized 
success/fail experiences (I.e., fj/nj) observed 
from program testing; R2 carries with it the 
assumption that the accomplished testing is repre- 
sentative of expected operational usage and thus 
assumes that the Individual contributions to Ifj 
are properly weighted. 

In either case, the reliability estimator is 
nothing more than a mathematical formula serving 
to extrapolate from a fixed number, n, of observed 
experiences (tests) to an unspecified number of 
expected experiences (operational uses of the 
program). When testing is, indeed, not represen- 
tative of expected operational usage, we have 
reason to become concerned about the validity of 
the extrapolation. The next section introduces a 
technique for quantifying that concern in a way 
that provides a measure of confidence in the 
estimate of operational usage reliability. 



Measuring the representativeness of testing. I f we 

have in hand a set of n test cases and have done 
the work necessary to define a partition of the 
Input Data Space, E, and produce the operational 
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profile probability distribution, P(Zj), then the 
degree to which the testing 1s representative of 
operational usage Is given by: 



X jfi "PUj) 

tuse" of the chi -square formulation yields a single? 
quantified measure of the extent to which the ' 
^observed frequencies, , are proportional to the? 

theoretical frequencies, n-P(Zj ), which would 7. 
"(theoretically) be obtained If a random sample of / 
r°n tests were drawn from the Input Data Space in ? 
(' accordance with P{Zj), / Clearly, if actual testing 
is very representative of expected operational 
usage, then nj a n ■ P(Zj) for j =1, 2, .... K, 
and the numerators in (3) will have values near 
zero. The resulting value will therefore be 
very small and (entering a table of x 2 with K-l 
degrees of freedom) a very high confidence in the 
reliability estimate will be obtained. On the 
other hand, very non-representative testing will 
yield a large x' value and, from the table of 
the corresponding level of confidence may well be 
"significantly" low. 

2 

Finally, it is important to note that the x 
computation provides both 1) a measure of our 
confidence in the operational usage reliability 
estimate, and 2) information which directly 
supports the design of supplementary test cases 
as necessary to decrease the differences, 
nj - n • P(Zj). If properly designed, the addi- 
tional test cases have the net effect of driving 
the x 2 value closer to zero and yielding increased 
confidence in subsequently computed operational 
usage reliability estimates. 

Representative testing and reliability 
measurement - an application 

The preceding discussion has presented ample 
background information and sufficiently definitive 
treatment of fundamentals to support a general 
understanding of the primary elements of a repre- 
sentative testing and reliability measurement 
methodology. In order to more clearly depict how 
the methodology might be employed, a sample appli- 
cation is presented below. The selection of the 
sample program was guided, in part, by a substantial 
amount of previously accomplished structural 
testing analysis [5J. The program corresponds to 
a problem which Fred Gruenberger [6] stated pretty 
much as follows: "Determine whether three integers 
representing three lengths constitute an equilat- 
eral, isosceles, or scalene triangle or cannot be. 
the sides of any triangle. Reference 5 provides 
a detailed description of a Triangle Type 
Determination Program (TTDP) and presents rationale 
for and results from application of certain struc- 
tural-exercise test effectiveness measurement 
tools belonging to TRM's Product Assurance 
Confidence Evaluator (PACE) system. 

Figure 1 presents a detailed flow diagram of TTDP, 
and additional descriptive Information and 
structural analysis results are given here in 
Figures Z through 5. • 




Figure 1. Flow Diagram of TTDP 




Figure 2. Network of Node-to-Node Branching 
Potential 
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Identification 


Path fteoresentation 


PI 


1-2-3-4-5-6-7-13-16-18-20 


P2 


1-2-3-5-7-13-14-12 


P3 


1-2-3-5-7-13-14-15 


P4 


1-3-4-5-7-13-16-17-12 


P5 


1-3-4-5-7-13-16-17-15 


P6 


1-3-5-6-7-13-16-18-19-12 


P7 


1-3-5-6-7-13-16-18-19-15 


P8 


1-3-5-7-8-9-10-11 


P9 


1-3-5-7-8-9-10-12 


P10 


1-3-5-7-8-9-12 


Pll 


1-3-5-7-8-12 



Figure 3. Logical Paths Through the Program 
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Figure 4. Test Case Input Values for Path 
Operation 



Subsets of Paths to Guarantee Usage of All 
Statements : 

PI, P8, P3, P4, P6 
Subset of Paths to Guarantee Usage of All Branches: 
PI, P8..P3, P4. P6, P2, P5, P7, P9, P10, Pll 



Figure 5. Path Subsets for Program Testing 
Coverage 



Now let's suppose that there is an actual, 
ultimate user of TTDP; that is, there exists a 
person somewhere who awaits completion of TTDP 
program development and testing so that the program 
can be put to work solving real triangle type 
determination problems. Let's further assume: 

1) the user has been doing triangle type 
determination by hand for quite a while 
and, therefore, has a good idea of the 
general types of problems to which the 
program will be exposed during operational 
usage, and 

2) the user has requested, paid for and used 
software in the past and, in view of some 
unpleasant experience, now seeks a con- 
vincing argument and some assurance that 
TTDP will not fail to meet the specified 
needs In the operational environment. 

Structural element testing strategy. The dominant 

theme in Reference 5 focused on techniques for 
developing a set of test cases which served to 
accomplish a certain (albeit limited) objective, 
i.e., assurance that each and every structural 
element would be exercised at least one time during 
execution of the program with the complete set of 
test cases. Preparation of the test cases was 
influenced by only one factor which was 1n any way 
descriptive of expected operational usage of the 
program; I.e.. it was assumed that the program 
would be presented with only positive Integer values. 

It 1s interesting, but perhaps a little disturbing, 
to consider what kinds of statements can be made 
about the reliability of TTDP upon successful 
execution of the test cases. One might be 
prompted, for example, to state: "The program has 
been tested with 5 (11) test cases as required to 
cause all statements (branches, paths) to be exer- 
cised at least once with selected Inputs'. The 
program operated successfully producing correct 
results for all 5 (11) test cases; thus, the best 
estimate of the probability of failure during 
operational usage 1s 0 and the probability of suc- 
cessful operation (reliability) 1s 1." This state- 
ment is, of course, hot much better than the age- 
old: "It worked OK for me, so it ought to work OK 
for you . " 

It is possible to add some punch to the state- 
ment through systematic application of the tech- 
niques described in the preceding section. We 
assumed, above, that the user was Intimately 
familiar with the problems which TTDP is supposed 
to handle. If we look for ways In which expected 
operational usage might be characterized, there 
are several obvious ones that serve our immediate 
purpose. Suppose, for instance, that the user 
reflected for a while on past experience with 
triangle type determination and was able (with 
some reasonable degree of confidence) to state: 



About half of the time, the three Integers 
do not represent a triangle. About one- 
third of the time, the triangle is isosceles; 
one-tenth of the time. It's scalene. The 
rest of the time it turns out to be 
equilateral . 
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This statement, tn effect, defines a four-way, 
functional partition of the Input domain such that: 

Gj ■ set of all combinations of positive Integer 
values which do not form a triangle 

■ set of all combinations which form an 
Isosceles triangle 

G 3 



G4 0 set of all combinations which form an 
equilateral triangle 

Furthermore, 1f 1t 1s assumed that past experience 
1s representative of expected future TTDP usage, 
then we may assign fractional values Indicating the 
probability that a given combination of integer 
values will belong to one of the sets as follows: 

P<G 1 ) • .5 

P(G 2 ) = .333 

P(G 3 ) - .1 

P(G 4 ) * 1 - [PIG,) + P(G 2 ) ♦ P(G 3 )] - 1-.933 

.067 _ 

The set of P values Is one instance of the oper- 
ational profile probability distribution treated 
at length in the earlier discussion. 

It Is a relatively simple matter now to examine 
the actual test case Input values (Figure 4) and 
corresponding output results and classify each 
test according to the set, G k , to which It belongs. 
Doing so for the five test cases (Figure 5) yields 
the following frequencies: 



n 1 ■ 2, n 2 ■ 1, n 3 = 1 , n^ ■ 1 



Applying the previously discussed x 2 formulation, 
we obtain: 

2 4 [n k - nP(G k )] 2 

x Vi -wnp — 

X 2 » .1 + .265 + .5 + 1.32 
x 2 » 2.185 



Similarly, we may examine the set of 11 test 
cases which are necessary to exercise all branches 
and logical paths derivable from the program 
(Flguresl and 2) and listed 1n Figure 3. Review 
of test results with reference to Figures 4 and 5 
yields: 



and 



J . [6-]j(-5)3 2 + r>ll(.333)3 2 [1-lU.lft 2 

* m.ly + W:333j + llM) 

+ UdIL067l£ 
1K.067) 



x c « .045 + .120 + .009 + .094 



x 2 -» .268 



Finally, a direct comparison of the x values 
(i.e., 2.185 versus .268) causes us to conclude 
that the set of 11 test cases 1s more representa- 
tive of expected operational usage than is the set 
of five test cases. On the other hand, we tend to 
be less critical of the five cases 1f we consider 
another five-case set, consisting of all equilateral 
triangles for which the resulting x 2 value 1s 
approximately 69.5. Similarly, a set of 11 equi- 
lateral triangle. test cases would yield a whopping 
big x 2 of 153.2 making the measured value of .268 
look very good, In fact, a little more arithmetic 
makes us feel even better when It's seen that the 
minimum x 2 value achievable with 11 test cases Is 
.179 [since it 1s Impossible to obtain 
nit = 11 * p ( G k) for a11 <d- 

If we now enter a table of x 2 with three degrees 
of freedom, we find that the probability of 
obtaining a value of x 2 as large as or larger than 
.352 1s .95. Since the observed value of .268 
falls close to the tabled value of .352, the proba- 
bility of the obtained value 1s slightly greater 
than .95. Similarly, the probability of the 
observed value of 2.185 1s slightly greater than 
.5. We are now ready to return to the previous 
statement about the reliability of TTDP and modify 
the last sentence as follows: 



"The program operated successfully producing 
correct results for all 11 test cases; thus, 
the assessed failure ratio Is 0/11 and the 
assessed reliability 1s 



where the x 2 value is a simple measure of the 
degree to which the actual test cases are repre- 
sentative of the defined operational profile 
probability distribution" ! 



set of all combinations which form a n i " 6 « n 2 a 3 * n 3 0 1 • "4 a 1 

scalene triangle 



S 2 " 1 - TT" 
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The best estimate of the probability of 
success during operational usage is derived 
from: 



4 f lr 

1 k=i n k. K 



where P(Gl) is the probability assigned to 
k tn set of the four sets constituting the 
operational profile, n k is the number of 
test cases belonging to the k*h set, and 
fk is the number of test cases (from the k tn 
set) which failed. Thus, the predicted 
operational usage reliability, Ri , 1s easi- 
mated as: 



- 1 - |§(.5) + f (.333) +° (-1) 
+ ° (.067)1 = 1 



i.e., Rj = 1 with confidence slightly greater 
than .95 . Similarly, from the set of five 
test cases, R] = 1 with confidence slightly 
greater than .5. " 

It 1s worth a little more effort to see what 
happens if not all of the test cases were 
successful. In actual practice, it may well be 
required that all test cases produce satisfactory 
results. For purposes of discussion, however, 
let's assume that one of the 11 test cases (e.g.* 
one of the Isosceles sets of integer Inputs) 
causes TTDP to fall and, for one reason or another, 
the problem has not been fixed. Obviously, the 
assessed failure ratio would' then be 1/11 and the 
assessed reliability would be R? = 1 - 1/11 ■ .909. 
The predicted operational usage reliability, 
however, Is obtained by computing: 



Rl ■ 1 -[I (-5) +y(-333) +°-(.067)j 

- .889 



Thus, Ri ■= .889 with confidence slightly greater 
than .95. 

Notice, however, that If the single failure had 
come from the equilateral triangle test case, we 
would still obtain 

R 2 » 1 - {y - .909, 



but the predicted operational usage reliability 
would be higher. I.e., 

R 1 « 1 - } (.067) = .933 



with unchanged confidence slightly greater than 
,95 based on the measure of the representativeness 
of the accomplished testing. 

The preceding discussion and derivation 
exemplifies one approach to Improving upon an 
otherwise uncertain statement about the operational 
reliability of TTDP. Note that the choice of the 
particular approach and the definition of the 
functional partition of the input domain was 
prompted directly by the form of the user's state- 
ment about expected operational usage. The state- 
ment itself was based upon 1) a good deal of past 
experience with triangle type determination and, 
2) the user's ability to characterize that 
experience In a quantified, meaningful way. 

We may approach the problem In a slightly 
different manner by exploring at least one alter- 
nate form of the user's statement. For example, 
let's assume that the Integer values represent 
rank scores of a student on a series of three 
separate examinations. To simplify our compu- 
tations, let's also assume that the rank scores 
range from 1 to 4. Now if we define max as the 
maximum difference between two of the three 
scores, I.e.: 

max ■ maximum |jl-J|» f I — K j » |J-K|J 

then the following operational usage partition of 
the input domain is possible: 

S, = set of all combinations of rank scores 
for which max a 0 

S- a set of all combinations of rank scores 
for which max = 1 

5 3 n set of all combinations of rank scores 

for which max = 2 

5 4 ' set of all combinations of rank scores 

for which max = 3 

Furthermore, If the user knows enough about the 
similarity of the three examinations and 1s aware 
of the tendency of rank scores to "bunch up" 
accordingly, 1t 1s entirely possible that the 
following operational profile probability distri- 
bution, could be established. 



P(S 7 ) * .3. P(S 2 ) = .4, P(S 3 ) - .2, P(S 4 ) - .1 



As In the preceding derivation, an examination of 
the 11 test cases used to exercise all paths of TTDP 
yields the following frequencies: 



n 1 » 1, n 2 » 6, n 3 » 4, n 4 =» o 
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We proceed as before and compute the x value as: 

2 4 [n k -n P{S R }] 2 

X Vl " P < S k> 

2 a [l-11t.3)]2 + [6-lU.4)]2 + [4-1 1 (.2 ft 2 
x 1U-3) 11 (-4) 1M.2) 

,2 B 12.3J2 lLi )2 

x 3.3 4.4 2.2 1.1 

X 2 = 1.603 + .582 + 1.473 + 1.1 

X 2 = 4.758 

From the table of x 2 values (again using 3 df), we 
find that the observed value of x 2 fall s close to 
the table value of 4.642. The probability of 
obtaining a value of x 2 as l ar 9 e as or larger than 
4.642 is .2, and we may modify the earlier 
proposed and once-mod1f1ed statement about the 
reliability of TTDP accordingly. That is, in the 
event that all 11 test cases executed successfully, 
and substituting S for G, 

we observe ft, = 1 

and predict R, = 1 with confidence slightly 
less than .2. 



In the event that one of the test cases failed 
{e.g., one of the four belonging to S3) and 
assuming that 0/0 = 0, we obtain: 



but 





- jl " -909 (same 


as before) 


V' 


4 f k 

k=l n k K 




R, = 1 


-?(.3)+|(.4) 


+ }(.2) ♦ £(.!) 


^ = 1 


- 0 + 0 + .05 + 0 


= .9F 



hus, the predicted operational usage reliability, 
1, 1s estimated as .95 with confidence slightly 
less than .2. 

Before leaving the example, It should be pointed 
out that there is a straightforward way to Improve 
(I.e., decrease the x 2 value and increase the 
confidence) upon the current situation through 
strategic addition of test cases to the original 
set of 11. Notice, for example, that the largest 
contribution to the above x 2 value resulted from 
a noticeable difference between the observed 
frequency and expected frequency of test cases 
belonging to Si . By simply adding one test case 
satisfying the property max * 0, we obtain: 




X 2 = .711 + .300 + 1.067 + 1.2 

X 2 = 3.278 with corresponding confidence 
slightly greater than .3 

Noticing that the largest contribution now comes 
from the fourth term we may add a 13th test case 
(satisfying max = 3) which yields: 

x 2 * jj (12.033 + 1.6 + 9.8 + .9) * 1.872 

with corresponding confidence of approximately .6. 

Clearly, the process can be continued as long 
as necessary, thus ensuring that the evolving set 
of tests becomes sufficiently representative of 
expected operational usage to permit prediction 
of operational reliability with specified (and 
perhaps required) confidence. 



Conclusions and recommendations 

Me have provided both a detailed description and 
a demonstrated application of a methodology which 
relates software testing strategy to reliability 
assessment. Considerable attention was given to 
presenting the methodology in a way which would 
both encourage and facilitate Its application. 
As indicated earlier, the representative testing 
and reliability measurement methodology 1s but one 
of the products of a substantial on-going research 
activity. The research to date has focused on the 
fundamental properties of software, of classical 
and Innovative testing technology, and of the 
basic requirements for highly confident estimation 
of software reliability. 

The methodology as described 1s not limited In 
application to small and trivial programs such as 
the triangle type determination example. It is not 
yet' clear, however. Just what kind of effort or how 
much of 1t 1s required for effective application to 
larger and more complex programs. There 1s, Indeed, 
a great deal more to be learned, and continued 
research along the following lines 1s considered 
to be essential: 

« Investigation of difficulties and 
development of techniques for deriving 
and expressing operational usage profiles; 
I.e., Identification of appropriate 
partitions of a program's Input Data Space 
and development of corresponding probability 
distributions as a function of software 
performance requirements specifications and 
other available documentation, if any. 

• Definition of the responsibilities of the 
typical participants in the software 
procurement process as necessary to ensure 
effective application of the methodology. 
For example, investigate candidate approaches 
to user specification of expected operational 
usage, and establishment of minimum 
acceptable levels of 1) testing representa- 
tiveness, 2) assessed and predicted opera- 
tional usage reliability, and 3) confidence 
1n the reliability estimate. 
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Determination of potential cost, 
quality and schedule impact of 
operational usage testing with 
respect to current methods of 
retestlng following software 
modification; e.g., investigate 
the application of the chi- 
square test for representative- 
ness as an indication of 
retesting sufficiency and 
identify possible trade offs 
between reduction in opera- 
tional usage due to selected 
retesting of modified soft- 
ware elements. 

Implicit in these statements of required research 
are many as yet unanswered questions. In some 
respects this paper has only scratched the surface 
of the software reliability problem. It is hoped, 
however, that sufficiently detailed and careful 
treatment has been given to the representative 
testing and reliability measurement methodology 
to stimulate the reader's interest and to focus 
research on one or more of these unresolved issues. 
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